Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures
نویسندگان
چکیده
منابع مشابه
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrixmatrix multiplication with a focus on performance portability across different high performance computing archite...
متن کاملAccelerating Sparse Matrix Vector Multiplication on Many-Core GPUs
Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned ...
متن کاملExploiting Locality on Parallel Sparse Matrix Computations
By now, irregular problems are di cult to parallelize in an automatic way because of their lack of regularity in data access patterns. Most times, programmers must hand-write a particular solution for each problem separately. In this paper we present two pseudo-regular distributions which can be applied to partition most problems achieving very good average case distributions. Also, we have des...
متن کاملcient Sparse Matrix - Matrix Multiplication on Multicore Architectures ⇤
We describe a new parallel sparse matrix-matrix multiplication algorithm in shared memory using a quadtree decomposition. Our implementation is nearly as fast as the best sequential method on one core, and scales quite well to multiple cores.
متن کاملEfficient Sparse Matrix-Matrix Multiplication on Multicore Architectures∗
We describe a new parallel sparse matrix-matrix multiplication algorithm in shared memory using a quadtree decomposition. Our preliminary implementation is nearly as fast as the best sequential method on one core, and scales well to multiple cores.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2017
ISSN: 1045-9219
DOI: 10.1109/tpds.2017.2656893